Topological data analysis is a new area of study aimed at having applications in areas such as data mining and computer vision. The main problems are:
The human brain can easily extract global structure from representations in a strictly lower dimension, i.e. we infer a 3D environment from a 2D image from each eye. The inference of global structure also occurs when converting discrete data into continuous images, e.g. dot-matrix printers and televisions communicate images via arrays of discrete points.
The main method used by topological data analysis is:
Contents |
Data is often represented as points in a Euclidean n-dimensional space En. The global shape of the data may provide information about the phenomena that the data represent.
One type of data set for which global features are certainly present is the so-called point cloud data coming from physical objects in 3D. E.g. a laser can scan an object at a set of discrete points and the cloud of such points can be used in a computer representation of the object. Point cloud data refers to any collection of points in En or a (perhaps noisy) sample of points on a lower-dimensional subset.
For point clouds in low-dimensional spaces there are numerous approaches for inferring features based on planar projections in the fields of computer graphics and statistics. Topological data analysis is needed when the spaces are high-dimensional or too twisted to allow planar projections.
To convert a point cloud in a metric space into a global object, use the point cloud as the vertices of a graph whose edges are determined by proximity, then turn the graph into a simplicial complex and use algebraic topology to study it. An alternative approach is the minimum spanning tree-based method in the geometric data clustering.[1] If a group of data points forms a cluster, then the geometry of this point cloud can be determined.
See homology for an introduction to the notation.
Persistent homology essentially calculates homology groups at different resolutions to see which features persist for long periods of time. It is assumed that important features and structures are the ones that persist. We define persistent homology as follows:
Let be a filtration. The p-persistent kth homology group of is .
If we let be a nonbounding -cycle created at time by simplex and let be a homologous -cycle that becomes a boundary cycle at time by simplex , then we can define the persistence interval associated to as . We call the creator of and the destroyer of . If does not have a destroyer, its persistence is .
Instead of using an index-based filtration, we can use a time-based filtration. Let be a simplicial complex and be a filtration defined for an associated map that maps simplices in the final complex to real numbers. Then for all real numbers , the -persistent kth homology group of is . The persistence of a -cycle created at time and destroyed at is . [2]
There are various software packages for computing persistence intervals of a finite filtration, such as jPlex, Dionysus and the Perseus software projects.